Loan Default Prediction on Large Imbalanced Data Using Random Forests
نویسندگان
چکیده
In this paper, we propose an improved random forest algorithm which allocates weights to decision trees in the forest during tree aggregation for prediction and their weights are easily calculated based on out-of-bag errors in training. Experiments results show that our proposed algorithm beats the original random forest and other popular classification algorithms such as SVM, KNN and C4.5 in terms of both balanced and overall accuracy metrics. Experiments also show that parallel random forests can greatly improve random forests’ efficiency during the learning process.
منابع مشابه
A Linear-dependence-based Approach to Design Proactive Credit Scoring Models
The main aim of a credit scoring model is the classification of the loan customers into two classes, reliable and unreliable customers, on the basis of their potential capability to keep up with their repayments. Nowadays, credit scoring models are increasingly in demand, due to the consumer credit growth. Such models are usually designed on the basis of the past loan applications and used to e...
متن کاملAn experimental comparison of classification algorithms for imbalanced credit scoring data sets
In this paper, we set out to compare several techniques that can be used in the analysis of imbalanced credit scoring data sets. In a credit scoring context, imbalanced data sets frequently occur as the number of defaulting loans in a portfolio is usually much lower than the number of observations that do not default. As well as using traditional classification techniques such as logistic regre...
متن کاملInvestigating the missing data effect on credit scoring rule based models: The case of an Iranian bank
Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...
متن کاملPredicting Probability of Loan Default Stanford University , CS 229 Project report
Stanford University, CS229 Project report Jitendra Nath Pandey, Maheshwaran Srinivasan 12/15/2011 Abstract: Extending credit to individuals is necessary for markets and societies to function smoothly. Estimating the probability that an individual would default on his/her loan, is useful for banks to decide whether to sanction a loan to the individual and is also useful for borrowers to make bet...
متن کاملINDUCING VALUABLE RULES FROM IMBALANCED DATA: THE CASE OF AN IRANIAN BANK EXPORT LOANS
<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...
متن کامل